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ABSTRACT; In this paper, we define and study the concept of traceable regressions. These 
are sequences of regressions in joint or single responses for which a corresponding regression 
graph captures not only an independence structure but represents, in addition, conditional de- 
• pendences that permit the tracing of pathways of dependence. We give the properties needed for 

■ transforming these graphs and graphical criteria to decide whether a path in the graph induces 

a dependence. The much stronger constraints on distributions that are faithful to a graph are 



(N 



. compared to those needed for traceable regressions. 
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1 Introduction and motivation 



_ Single and joint response regressions. Sequences of regressions are arguably the 

i most important statistical tool in observational and interventional studies for investigat- 

ing pathways of dependences and hence development over time. In each regression, one 
^ ■ distinguishes response variables and regressor variables; with responses depending 

^ ■ on the regressors. 

OO ■ 

0\ ■ In applications, the substantive context determines which variable pairs are modeled 

by a conditional independence and which are taken to be dependent because they are 
needed in a generating process of the joint distribution. Suppose one regressor is a risk 
factor for a response, then quite different sizes of dependence strength will be relevant 
if this response is the occurrence of a common cold, or the infection with an HIV virus 
^ ■ or an accident in a nuclear plant, since the prevention of these risks is judged to be of 

d ' quite different importance. 

There may be single or joint responses, where only the latter permit to model simul- 
taenously occurring effects of an intervention. Components of joint responses may be 
discrete or continuous random variables or be mixed of both types. Typically, a subset 
of variables is taken as given, possibly determined by study design, and its components 
are named context variables since they describe the context or background or the 
basic features of individuals under study. 

The generated joint density factorizes into an ordered sequence of conditional densi- 
ties of the responses, which we call shortly regressions, and into a joint marginal density 
of the context variables. Under mild conditions, estimation of sequences of regressions 
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can be decomposed into separate tasks for each response component of the factorization, 
using well-developed tools such as linear or logistic regressions or conditional Gaussian 
regressions, which permit joint responses to be mixed of discrete and continuous com- 
ponent variables; see Lauritzen and Wermuth (1989), Edwards and Lauritzen (2001). 
Tailored to the requirements in many specific situations, special results are available to 
estimate the form and parameters of univariate and joint conditional distributions. 

However, many consequences of sequences of regressions can already be derived if 
one does not know or estimate the involved parameters but just uses an associated 
graph and properties of graph transformations. Relevant, important results concerning 
independences in sequences of regressions have been obtained only recently; see Sadeghi 
and Lauritzen (2012) and Wermuth and Sadeghi (2012). The additional properties 
needed to draw conclusions about induced dependences are set out in this paper. 

Sequences of regressions are an essential part of longitudinal studies, named also 
cohort or panel studies in medical, economic and social science research. Prominent 
examples are the Framingham heart study, the European Community household panel 
or the Swiss HIV cohort study. By using regression graphs, it will become possible to 
simplify analyses and interpretations of sequences of regressions and to directly compare 
dependences arising in different types of sequences of regressions for the same set of vari- 
ables, or in sequences of regressions for subsets of variables studied for subpopulations. 
The results in this paper prepare for these possibilities in applications. 

Independences and dependences given by regression graphs. Sequences of 
univariate, that is of single-response regressions, have been represented by directed 
acyclic graphs. With regression graphs, directed acyclic graphs are extended by in- 
cluding two types of undirected graph, one for joint responses, the other for joint context 
variables. Nodes of the graph represent random variables. Distinct node pairs are cou- 
pled by at most one edge so that a regression graph is one type of what in graph theory 
are called simple graphs. Each missing edge of a regression graph corresponds to a 
conditional independence where the conditioning set depends on the type and position 
of the missing edge, the graph is therefore also one type of independence graph. 

Properties or axioms for combining independence statements have been studied by 
Dawid (1979) and Pearl (1988). Their connections to graphs have been discussed and 
modified in information theory; see Studeny (2005) and Lnenicka and Matiis (2007). 
Different types of extensions have been proposed in the computer science literature; 
see Castillo et al. (1997), Flesch and Lucas (2007). But, for instance, by requiring a 
property called strong transitivity, one excludes even the whole family of regular joint 
Gaussian distributions. By contrast, this family forms a subclass of what we introduce 
here as traceable regressions. 

The independence structure of a graph is the set all independence statements 
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implied by the graph. These are well-studied for regression graphs, but with impor- 
tant results obtained only recently. For instance, a proof by Sadeghi and Lauritzen 
(2012) implies equivalence of a pairwise Markov property, that is of the set of in- 
dependences attached to the missing edges of a given regression graph, to the global 
Markov property, the criterion known to give all independence statements implied by 
the graph. For two regression graphs with identical node sets and with the same set of 
coupled node pairs but with different types of edge, there is a simple graphical criterion 
to decide whether the two graphs define nevertheless the same independence structure, 
that is whether they are Markov equivalent; see Wermuth and Sadeghi (2012). 

Tracing pathways of dependence. Much less is known about the dependence 
structures that can be captured by graphs. Since graphs do not distinguish between 
additive and interactive effects of regressor variables on responses, nor between linear 
and nonlinear types of dependences, it has been argued by Wermuth and Lauritzen 

(1989) that graphs may represent research hypotheses about dependent variable 
pairs needed to generate the joint distribution. For this, each edge present in the 
graph indicates a conditional dependence, where the conditioning set depends on the 
type and position of the edge present, while the form of the dependence is not specified. 

For tracing pathways of dependences, dependence-inducing sequences of edges of 
different type are the focus of interest, while independences just lead to simplified 
strengthened interpretations of the relevant dependences. In this paper, we set out 
the properties of traceable regressions and show, in particular that these properties im- 
pose mild constraints on the types of generated distribution in contrasts with serious 
constraints required in general for faithful distributions. This notion was introduced 
by Spirtes, Scheines and Glymour (1993) for distributions in which all independence 
statements hold that are implied by a graph and no others. 

Tracing pathways of dependence goes back to the geneticist Sewall Wright (1889- 
1988), who introduced it in 1923 as path analysis for sequences of univariate linear 
regressions. He suggested to judge the goodness-of-fit of a research hypothesis, rep- 
resented by a directed acyclic graph, by comparing observed correlations with those 
that are expected if the data had been generated over the graph. His rules for com- 
puting expected marginal correlations, trace all pathways that induce a dependence by 
marginalizing. 

The extension of tracing pathways of dependences, when there is conditioning on 
variables in addition to marginalizing, became feasible after a first separation cri- 
terion had been formulated by Pearl (1988) and proven by Geiger, Pearl and Verma 

(1990) to give the global Markov property of directed acyclic graphs. When separation 
fails then there is at least one path in the directed acyclic graph that may induce a 
dependence by marginalizing over one subset of variables and conditioning on another 
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set. Here, such a path is said to be edge- inducing since it leads to a transformed graph. 

Structure of the paper. In Section 2, we introduce and discuss dependence base 
regression graphs and traceable regressions. Section 3 contains examples of tracing paths 
and of planning future follow-up studies on the same topic so that there are no paths 
distorting a generating dependence of interest. Small Gaussian families of distributions 
are used to illustrate independence properties of traceable regressions. In Section 4, 
several discrete families of distributions are given to show how the properties of trace- 
able regressions can be violated. In Section 5, the known properties of an edge matrix 
calculus to transform graphs are collected first. These are used to derive new properties 
of transforming regression graphs and to distinguish traceable regressions from distri- 
butions that are faithful to regression graphs. A short discussion ends the paper. 

2 Definitions and terminology 

Some terminology for graphs. Most of the following definitions are standard or 
evocative and listed for completeness. A graph consist of a node set N = {1, . . . d^} 
and of edges that couple node pairs. In simple graphs, edges couple exclusively distinct 
node pairs by at most one edge so that the endpoints i and k of an ik-edge never 
coincide. For an zA;-arrow, — k, node k is commonly named the parent of node i. 

For a regression graph, , there is an ordered partitioning of the node set as 

= (m, v) where u contains the response nodes, each having possibly several parent 
nodes and v contains context nodes, none of which has a parent node; see for instance 
Figure 1 below. There are three types of edge sets, for directed dependences 

of responses on their regressors, E__ for undirected dependences among components of 
a joint response, and E for undirected dependences among context variables. 

An ik-path connects the path endpoints i and A; by a sequence of edges. An ik- 
path can be an edge, otherwise it has distinct inner nodes such that each edge visits 
an inner node once. There is is an a-line path, if all its inner nodes are in subset a of 
A^. A path of arrows is direction-preserving if all its arrows point in the same direction. 

For a, b arbitrary disjoint subsets of A^, one says there is a path between b and a if 
one endpoint is in a and the other endpoint is in b, while we say there is a path from b 
to a if the subsets are ordered as (a, b) that is if direction-preserving paths may start in 
b and point to a, but not vice-versa. In a direction-preserving ifc-path, node k is named 
an ancestor of i and node i a descendent of k. 

A subgraph, induced by a subset a of the node set A^, consists of the nodes within 
a and of the edges present in the graph within a. A special type of induced subgraph, 
needed here, consists of three nodes and two edges. It is named a V- configuration or 
just a V. Thus, a three-node path forms a V if its induced subgraph has two edges. 

In a complete graph, every node pair is coupled by an edge. In a connected 
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subgraph, every node can be reached by a path. The connected component of a 
regression graph are the disjoint connected graphs that remain when all its arrows are 
removed. Nodes within a connected component are said to be concurrent nodes. 

Generating sequences of regressions and graphs. We consider random 
variables Yi, which may be discrete or continuous or a mixture of both types. For a 
more formal definition of the measure spaces, asked for by a referee, see for instance 
Lauritzen and Wermuth (1989). The variables have labels in node set and form a 
vector variable, denoted by Yn- In the following, an element i of is not distinguished 
from the singleton {i} and the union sign for combining subsets of is often omitted. 

For i,k a. node pair and c G N \ {i, k}, we write i JL k\c for YJ, conditionally 
independent given Yc. If such an independence constraint is satisfied by a density fikc, 

iJLk\c <^==^ ifi\kc = fi\c) fik\c = ifi\cfk\c)- 

It has become common to say that a joint family of densities can be generated 
over a chain graph if it factorizes according to a set ordering of the nodes, called a 
chain, and satisfies all independences implied by the graph. Different types of chain 
graph and corresponding models for discrete variables are discussed by Drton (2009). 

When independence structures are the focus of interest, one starts traditionally with 
the graph. Regression graphs in node set have three types of edge sets, E__ , 

and E For a regression graph, denoted by , there is a split of the node set as 

A^ = {u,v), so that concurrent response nodes are in u and concurrent context nodes 
in V, sets of ordered concurrent nodes are denoted by gj for j = 1, . . . , J. Subgraphs 
induced by any Qj are undirected. Undirected subgraphs induced by gj within v have 

edges i k and are commonly called concentration graphs. Undirected subgraphs 

induced by gj within u have edges i k and are called covariance graphs. 

For gj in u, nodes in g^j = gj^i U gj+2, ■ ■ ■ , ^gj are said to be in the past of Qj. 
Arrows may start from any node, except from those in gi, but never point to a node in 
gyj. With g^j = 0, the basic factorization of /at generated over a regression graph is 

In = fu\vfv with fu\v = Ujeu 4l9>, and U = Ujev fgy (1) 

Several orderings of gj may give the same factorization as explained below for Figure 1. 

Here, tracing of pathways is of main interest, hence we start instead with a stepwise 
generating process of /at for which A^ = {u, v) and one ordering of gj is fixed. 
In this process, the density of variables in gj is generated first, the one of gj-i given gj 
next, up to the density of gi given g^i. Then, variable pairs needed to generate give 
the edge set of with Definition 1 below and the factorization of equation ([T]) results. 

For a variable pair Yi,Yk needed in the generating process of Jn, we say it 
is conditionally dependent given Yc for some c G N \ {i,k} and write i (\\ k\c and a 
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graph is edge-minimal for a distribution generated over it, if every missing edge in the 
graph corresponds to a conditional independence statement and every edge present to 
a dependence. A family of densities Jn generated over an edge-minimal graph changes 
if any one edge is removed from the graph. 

Definition 1. Defining pairwise dependences of . An edge-minimal regression 
graph specifies with gi < ■ ■ ■ < gj a generating process for fj^, where the dependences 

i k : i ffi k\g^j for i, k concurrent response nodes in gj of u, 

— k : i rtl k\gyj \ {k} for response node i in gj of u and node k in g^j, (2) 
i k : i (ti k\v \ {i, k} for i, k concurrent context nodes in gj of v, 

define the edges present in . The meaning of each edge missing in G^g results 
with the dependence sign i+i replaced by the independence sign _LL . 

Thus, for the given order of the components gj, the graph implies for each variable 
pair i, k either conditional dependence or independence given the same type of condi- 
tioning set, with i k for two response nodes, with i^ — k for i a response node in gj 

and k a node in the past of gj, with i k for two context nodes. Notice that except for 

context nodes, each pair of variables is exclusively conditioned on variables that are in 
the past of the gj that contains node i. This permits to model simulateanously occurring 
effects of an intervention while this is not possible if the graph is directed acyclic or if 
it is another type of chain graph. 

Different generating processes may lead to the same regression graph and hence to 
the same implied independence structure. Then, some components, gj,gj+i, ■ ■ ■ ,gt, say 
of , have an interchangeable labeling because they induce disconnected undirected 
subgraphs. Such components are displayed in Figure [1] within stacked boxes. In a 
connected G^g , connected stacked components gj, . . . ,gt have the nodes in g^t as their 
common past and nodes in g^j = gi U g2, ■ ■ ■ , ^gj-i as their common future. For 
a generating process, one of the possibly many compatible orderings is fixed. In 
each, arrows point to response nodes in the common future but never to a node in the 
common past. 

In Figure [T] below, gQ and gj are in v, all other connected components are in u. The 

order implied by the arrows in of G^g remains unchanged if, for instance, the two 

disconnected subgraphs induced by gs and g4 are interchanged or if they are replaced 
by a single dashed-line complete graph in nodes {4, 5, 6, 7}. 

Recall that connected components of G^g are uniquely obtained as the connected 
subgraphs that remain after deleting all arrows from the regression graph and keeping 
the undirected edges and all nodes. Thus, for any given graph, it is not necessary to show 
stacked boxes, but they are sometimes included to reflect the first ordering, the prior 
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knowledge about possibly joint responses and joint context variables. By convention, 
we number nodes and components gj of first from top to bottom, then from left to 
right. In Figure [H = {4,5,6} and gs = {12, 13, 14} contain three nodes, each of g2 
and g^ contain two nodes, all others contain a single node. 




{91} {92} {93,94} {95,96} {97,98} 

Figure 1: A regression graph in 14 nodes and node set partitioned into 8 connected compo- 
nents; single responses in (71,94,(75 and joint responses in g2,93,96', context variables in gj,g^. 



Single responses correspond in the statistical model to univariate regressions, joint 
responses to multivariate regressions, including the seemingly unrelated regressions of 
Zellner (1962). In Figure [H seemingly unrelated regressions belong to the subgraphs 
induced by each of the three node sets {2, 3, 5, 6}, {5, 6, 8, 9}, {9, 10, 13, 14}. 

General and special properties of probability distributions. For z, /i, k single, 
distinct indices and a, 6, c, d disjoint subsets of index set A^, where only d may be empty, 
there are the common independence properties {i) to [iv) which are satisfied by all 
probability distributions. The discussed properties {y) to {yiii) constrain distributions, 
but they permit the use of just the graph to derive different types of consequences for 
families of distributions generated over . 

(i) symmetry: a_LL6|c 6_LLa|c, 

(m) contraction: [a ^Lh\cd and h ^L c\d) <^=^ aciL6|(i, 

{Hi) decomposition: a^Lhc\d =^ (a_LL6|(i and a_LLc|(i), 

{iv) weak union: a_LL6c|d =^ (a_LL6|c(i and a_LLc|M). 

Joint distributions, for which the reverse implications of {Hi) and of {iv) hold, have as 
additional properties, respectively, 

{v) composition: {aSLh\d and aJLc\d) =^ aSLhc\d, 
{vi) intersection: {a}Lh\cd and a_LLc|M) =^ a^Lhc\d. 

Properties {v) and {vi) are needed to derive the independence structure implied by G^g . 
Two further types of properties are to be considered for tracing pathways of dependence, 

{vii) set transitivity: (a _LL6|(i and a _LL&| C(i) =^ {a }Lc\d 01 h }Lc\d) ^ or 
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{via) singleton transitivity: {iALk\d and iALk\hd) =^ {iALh\d or kALh\d). 



Thus, distributions that satisfy set transitivity are also singleton-transitive, since c may 
contain only one element. Singleton transitivity requires for a conditional independence 
of Yi, Yk given Yd and given Yh-, Yd to hold both, there has to be at least one additional 
independence given Y^. involving Y/j, the additional variable in the conditioning set. It is 
unfortunate that, in the literature, the term weak transitivity has sometimes been used 
for property {vii) and sometimes for {yiii). 

We shall show that set transitivity, {vii), is used in addition to (i) to {vi) in trans- 
formations of by which no edge of the starting graph gets removed and by which 
an edge criterion for the global Markov property is obtained, while only singleton tran- 
sitivity, (viii), is needed in addition to (i) to {vi) to decide with a given edge-minimal 
, whether a path is inducing a dependence for its path endpoints or not. 

Singleton transitivity is a feature of what we define below as traceable regressions. 
So far, it had been known to be common to all positive binary distributions where, 
for instance, for (1 rtl 2 and 1 ftl 3) either 2_LL3 can hold or 2iL3|l but not both; see 
Simpson (1951). It also holds in all regular Gaussian distributions; see for instance 
Studeny (2005), Corollary 2.5 in Section 2.3.6. 

On the other hand, set transitivity imposes stronger constraints on any specific dis- 
tribution; see for instance the discussion of Figure 1 for trivariate binary distributions 
in Wermuth, Marchetti and Cox (2009) It also excludes some regular Gaussian families 
of distribution such as the following. 

A regular Gaussian family violating set transitivity. For = {u,v), let Yu 

and Yy be mean-centered vector variables with a joint Gaussian distribution. Let them 
have equal dimension, dim, the components of Y^ and of Y^ be mutually independent 
and all elements in the covariance matrix cov(Yu, Yy) = S„t, be nonzero, then every 
component of Yu is dependent on every component in Y^ and 

cov(Ku) = T^uu diagonal, cov(Ft,) = diagonal. 

Let further the components of Y^ have equal variances a; > 1 and the equal variances of 
the components Y^he n > u + l. Whenever in the described situation S„„ is orthogonal, 
the joint covariance matrix is invertible so that the joint distribution is regular and the 
marginal independences carry over to conditional independences so that also 

cov(F„|y;,) = T,uu\v diagonal, cov{Yy\Yu) = diagonal. 

Set transitivity is always violated, for a split v = {a,b), c = {1, . . . ,dim} and d = 0. 
This family extends the example in equation (8) of Cox and Wermuth (1993). 
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Some important properties of and /jy. Two basic types of Vs in G^gneed 
to be distinguished. There are collision Vs: 

i — fc, i — — fc, i o k, 

and transmitting Vs: 

— — k, — o k, i o fc, — o — yk, — o k. 

Recall that two different graphs in the same node set are Markov equivalent if they 
define nevertheless the same independence structure, that is the set of all independences 
implied by the graph. The skeleton of a graph consists of its nodes and its set of edges, 
irrespective of the type of edge. It results by replacing each edge present by a full line. 

Lemma 1. Markov equivalence. (Wermuth and Sadeghi, 2012). Two regression 
graphs with the same skeleton are Markov equivalent if and only if their sets of collision 
Vs are identical. 

A more compact characterization of the pairwise independences in Definition 1 is 
based on the notion of anterior paths. Recall first that with = {u,v), there are only 
undirected full-line paths within v and there are only arrows pointing from v to u. An 
anterior ik-path is either a descendant-ancestor z/c-path, or a context nodes z/c-path, 
or a descendant-ancestor zg-path with a context-nodes g/c-path attached to it, 

ancestors of i 

i -< — — o, . . . , — q o, . . . , o k. 

^ V ' 

anteriors of i 

We denote the joint set of anteriors of nodes i,k by antj^ = {antj U ant^} \ {i,k}. 
Similarly, for any subset c of A^, the anterior set of nodes within c is denoted by ante. 

The intersection (vi) and the composition property (v) are needed for Lemmas 2 and 
3. By using them, the independences attached to the missing edges of G*^g in Definition 
1 reduce to the more compact statements iALk\a.ntik and this leads to the definition of 
an active path in due to Sadeghi (2009) for a more general class of graphs. 

Let {a, b, c, m} partition A^, where c denotes a conditioning set of interest for a, b and 
m the set of nodes to be ignored that is to be marginalized over. Only c, m or both may 
be empty sets. A path in is active given c if of its inner nodes, every collision 

node is in c U ant^ and every transmitting node is in m. For graph transformation, such 
a path is also said to be edge-inducing. 

Lemma 2. Global Markov property of G^g . (Sadeghi, 2009). The regression 
graph implies aALb\c if and only if there is no active path in G^g between a and b 
given c. 
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Lemma 3. Equivalence of the pairwise and the global Markov property. 

(Sadeghi and Lauritzen, 2012). The independence structure of G^^^is equivalently de- 
fined by its lists of the three types of missing edges and by its global Markov property. 

To make Vs dependence inducing, we take an edge-minimal regression graph for 
/at, assume properties (i) to (vi) and, in addition property (viii), that is singleton 
transitivity. We then say is a dependence base for /^r since the implications of 
this type of graph can be derived with respect to both independences and dependences. 
We note first that by enumeration in Definition 1, the inner node of each collision V is 
excluded from the defining conditioning set for its endpoints, while the inner node of 
each transmitting V is included in it. This observation is generalized with Lemma 4. 

Lemma 4. The conditioning set of any independence statement implied by G^^^for the 
endpoints of any of its Ms, includes the inner node if it is a transmitting V and excludes 
the inner node if it is collision V . 

Proof. The statement results directly with Lemma 2. □ 

Let now a V in a dependence base have endpoints i, k and inner node o. Then 
by Definition 1 und Lemma 4, there is at least one c with c C \ {z. A;, o} such that 
i^Lk\c is implied if (z,o, k) is a collision V and z_LLA;|oc if (z,o, k) is a transmitting V. 

Proposition 1. Dependence inducing Vs. For {i,o,k) any V of a dependence base 
G^g and each c N \{i, k,o} such that this regression graph implies one of iALk\c or 
iALk\oc, the following two equivalent statements hold: 

— {i,o,k) forms a collision y {iALk\c =^ i rtl A;|oc), 

— {i,o,k) forms a transmitting V <^=^ {iALk\oc =^ i rh k\c) . 

Proof. For c = 0, collision Vs are Markov equivalent and transmitting Vs are Markov 
equivalent by Lemma 1. By edge- minimality, both edges of any V indicate conditional 
dependence for pairs i,o and k,o and by Definition 1, iALk holds for an inner collision 
node and z_LL A;|o for an inner transmitting node. Including the inner node of a collision 
V into the conditioning set, or excluding the inner node of a transmitting V from the con- 
ditioning set, generates an active path by Lemma 2. Such a path induces a dependence 
unless singleton transitivity is violated which contradicts an assumption. Similarly for 
c 7^ 0, an independence is implied by G^gif there is no active path between i and k 
given c by Lemma 4, but an active path is generated just as for c = 0. □ 

We can now define sequences of regressions that permit the tracing of pathways of 
dependence for f^ when a, b, c, d denote disjoint subsets of and only d may be empty. 
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Definition 2. Traceable regressions. We say results from traceable regressions if 

1. it could have been generated over a dependence base regression graph, , 

2. it has three equivalent decompositions of the joint independence h^Lac\d 

{i){h JLa\cd andh^Lc\d), {ii){h JLa\d andh^Lc\d), {iii){h ^La\cd andh^Lc\ad), 

3. dependence-inducing y ' s of G^^^ are also dependence-inducing for f ^ . 

Decompositions {i) to {Hi) in Definition 2 combine tlie previously discussed prop- 
erties {ii) to {vi). Symmetry of independences, that is property (i), holds trivially as 
in all probability distributions. Undirected edges correspond to symmetric dependence 
statements. For each arrow i^ — k in , symmetry of dependence holds only in the 
following weak sense. From Definition 1 for i in gj, there is some c C g^j \ k with 
fi\kc 7^ fi\c used in the generating process. Then, for Yk regressed instead on Yi, Yc, also 

fk\ic 7^ fk\c- 

Notice that traceable regression behave like regular Gaussian families generated over 
an edge-minimal . Therefore, for traceable regressions, a violation of set transitivity 
can occur only when there are at least two paths connecting the same node pair; see 
the family of regular Gaussian distributions given above that violates set transitivity 
and for further examples Wermuth and Cox (1998). We call these special types of para- 
metric constellations path cancellations as they result for a pair i, k after combining 
dependences induced by active zfc-paths in such a way that the joint contributions of all 
paths cancel. 

3 Applications and illustrations of traceable regressions 

Tracing paths. Whenever a pathway of dependence is traced in terms of a graph, one 
uses implicitly that every edge present is a strong enough dependence to be of interest 
in the given substantive context and that every V along a path is dependence-inducing 
for its endpoints, since otherwise, no dependence is implied for the path endpoints. 

Figure [2] shows a well-fitting regression graph for nine features observed for patients. 
The regression graph represents a research hypothesis on the sets of regressors needed 
for each response to generate the joint distribution. In this example, we use data of 
Kappesser (1997) on 201 chronic pain patients, where variable descriptions and detailed 
statistical analyses are given in Wermuth and Sadeghi (2012), not in this paper. 

The graph does not contain any information on the types of the dependence, but 
supplemented by estimates for the dependences, one can use the graph to interpret 
pathways of dependences. For instance the path Y, Za,A,B leads, together with the 
parameters estimated with linear and logistic models, to the following interpretations. 
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Figure 2: Regression grapli, well compatible with the data and resulting from statistical 
analyses. Binary variables are indicated by dots, variables treated as continuous by circles. 

Patients with a higher level of formal schooling are more likely to have head or neck 
pain than back pain. For patients with head or neck pain, the intensity of pain is better 
reduced after treatment than for the back pain patients. For lower pain intensity scores 
after treatment, treatment is the more successful the lower the pain intensity. For higher 
pain intensity scores after treatment, there are no systematic changes in Y. 

The graphs in Figure |3] are consequences of the generating graph in Figure [2j Figure 
[3^) implies that site of pain. A, would show a direct effect on Y if the two symptoms of 
chronic pain before and after treatment were either not measured or just omitted from 
the list of potentially important regressors. Similarly, chronicity of pain, U, would show 
a direct effect on Y if, in addition, site of pain. A, is omitted in [St). 



Figure 3: The graph of Figure [2] transformed, preserving the original ordering for the remaining 
variables by a) marginalizing over symptoms before and after treatment, Xa, Za, Zf,, Xf,; b) 
marginalizing over symptoms before and after treatment and, in addition, over site of pain, A. 

To derive and interpret transformed graphs well, such as those in Figure [31 and more 
complex graphs involving both marginalizing and conditioning, one needs to know the 
general properties of transforming regression graphs and realize that in general, induced 
dependences may not be reflected in significant statistical test results, in particular for 
small sample sizes or weak dependences attached to edges in the generating graph. 

Planning future follow-up studies. To show how tracing of active paths may lead 
to an improved planning of follow-up studies, we use the generating process, represented 
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by the graph in Figure [3], adapted from Robins and Wasserman (1997), and assume that 
all those dependences are strong that correspond to edges present in the graph. 




Figure 4: Generating process in five variables, missing edge for (Tp, U) due to full randomized 
allocation of individuals to treatments, and missing edges for (T^^U) and (Ti.,Tp) due to 
randomization conditionally on A; U expected to be unobserved in a follow-up study. 

Suppose that in the planned study, it will be possible to observe all variables of 
Figure 4 except for U, because the tools needed to diagnose the health status, U, before 
treatment will not be available. Marginalizing over U is indicated in Figure 4 by a 
crossed out node, Then U is excluded from any conditioning set for Y, the main 
response of interest. In general, whenever no active path is generated, one may proceed 
safely with estimating an effect, a dependence of main interest, in the follow-up study. 

With U unobserved, the dependence of Y on the past treatment Tp will always be 
modified, since by excluding also the intermediate outcome. A, and recent treatment, Tr 
from the list of regressors, one generates the active path Y, Tr, A, Tp, while by including 
either or A or both as regressors for Y, one generates the active path Y, U, A, Tp] see 
Lemma 2. The former is an example of an overall effect deviating from a conditional 
effect and the latter is an example of indirect confounding. 

If on the other hand, the dependences of Y on the recent treatment, Tr, is of main 
interest, then Tp is a common ancestor and the path Y,Tp, A,Tr becomes active by 
marginalizing over the inner nodes; an example of direct confounding. But no 
active path is generated between Y and when A and Tp are regressors in addition to 
Tr, so that the conditional dependence of Y on Tr given A,Tp can be estimated. 

Even though it may in principle be possible to recover the generating dependence 
given some distributional assumptions; see e.g. Wermuth and Cox (2008), one needs to 
obtain very precise estimates to make any correction worthwhile since poorly estimated 
parameters may also lead to bad corrections. 

Both types of confounding can also be detected using graphical criteria on trans- 
formed graphs in reduced node sets, named summary graphs; see Wermuth (2011). For 
constructing summary graphs by removing repeatedly single nodes, one needs to take 
into account that any given node can be a collision node on one path and a transition 
node on another path. This contrasts with the graph transformations in this paper. 
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where different types of active paths are closed in sequence. 

Examples of small Gaussian regression graph models. We illustrate next the 
intersection and the composition property by describing two different types of complete 
regression graphs in three nodes and the associated saturated models in the special case 
of regular families of Gaussian distributions for variables standardized to have zero mean 
and unit variance. Parameters are attached to the edges of the graphs. Example I shows 
that the intersection property is implicitly used with backward selections of important 
regressors in multiple regressions and Example II how the composition property is rele- 
vant for selecting important regressors in multivariate regressions. 

Example I: a complete single response graph with two context variables. The 

following complete graph in nodes 1, 2, 3 



2 




defines implicitly for standardized Gaussian variables, Yi,Y2,Y^ three nonzero parame- 
ters measuring dependence in 

E{Y,\Y2, Fs) = aY2 + 6Ys EiY^Y^) = p^s a^^-i = -p^^/{l - pl^) , 

where p23 denotes the marginal correlation of Y2, Y^ and a"^^'^ the concentration in their 
bivariate distribution, that is after marginalizing over Yi. For this complete graph, 
a 7^ means 1 rtl 2|3, 6^0 means 1 1+1 3|2, and a'^^-^ ^ means 2 rh 3. With a = 6 = 0, 
one requires 1_LL2|3 and 1_LL3|2 and removes the 12-edge and the 13-edge from the 

complete graph so that node 1 remains isolated from 2 3. For the resulting graph, 

the seemingly obvious interpretation 1 _LL (2, 3) requires the intersection property. 

Example II: a complete joint response graph with a single regressor. The 

following complete graph 

2 

— o3 

/3 

defines for standardized Gaussian variables three non-vanishing parameters, /3,7,cri2|3, 
in 

E(ri|r3) = iiYz E{j2\Y^) = coy{YiY2\Ys) = au\3 • 

Here, (Ji2|3 7^ means 1 iti 2|3, /3 7^ means 1 rtl 3, and 7 7^ means 2 rtl 3. With 
/3 = 7 = 0, one requires 1_LL3 and 2_LL3 and removes the 13-edge and the 23-edge from 
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the complete graph so that node 3 remains isolated from 1 2. For the resulting graph, 

the interpretation (1,2)_LL3 requires the composition property. 

Standard properties for combining independences. Properties (ii) to {iv) 
that are common to all probability distributions with a given density, are illustrated 
next by using the directed acyclic graphs in the three ordered nodes (1,2,3) shown in 
Figure El again for standardized Gaussian distributions. 

Example III: a complete directed acyclic graph. The complete graph in nodes 
1, 2, 3 of Figure [5^) gives for standardized Gaussian variables three nonzero parameters, 
a, 5,7, measuring dependence in 

E(Fi|F2, Y;) = aY2 + 6Y^, E(F2|>3) = 7>^3, ^{Y-,) = , 
where a 7^ means 1 rh 2|3, 5 7^ means 1 lil 3|2, and 7 7^ means 2 rtl 3. 




8 5 5 

Figure 5: Directed acyclic graphs in 3 nodes with parameters in standardized Gaussian dis- 
tributions attached to the edges; a) the complete graph, b) the graph implying 1_1L2|3, c) the 
graph implying 2_IL(1,3). 

The interpretation of 5 changes to 5 7^ means 1 rtl 3 in Figure 5b) where 1 _LL2|3 is 
implied by the graph. This reflects that a different family of distributions is generated 
when the 12-edge is removed. The graphs define implicitly the factorizations of in 
equation ([1]), respectively, as 

/i23 = /1123/213/3, (/i23 = /113/213/3) =^ liL2|3, (/i23 = /113/2/3) =^ 2_U_(1,3) . 

The factorization of a joint density as specified with a complete directed acyclic 
graph is formally always possible. Independence constraints imposed in sequence on 
two consecutive factors of /123 generated as in Figure |5^), such as liL2|3 constraining 
/i|23 = /i|3 changes the triangle in the graph of Figure [5^) to a V in Figure [5]d) and 
2_1L3 constraining /213 = /2 creates next an isolated node 2 and 1^ — 3, in Figure Et). 

The removal of the two arrows gives one direction of the contraction property, start- 
ing from the factorization to Figure [St) gives the other direction. Given the factorization 
of any density to Figure [St), marginalizing over Y3 leaves /12 = /1/2 and marginalizing 
over Yi gives directly /23 = 72/3 that is decomposition, while conditioning on 12,^3 
leaves directly /i|23 = /i|3 and conditioning on ^1,^2 gives /3112 = /311 that is weak 
union. Also in more complex situations, these three properties, {ii), (ni), {iv), com- 
mon to all probability distributions, can be derived by transforming factorized densities. 
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4 Violating properties of traceable regressions. 

Some small discrete families of distribution are given that are not traceable regressions. 
These may be extended and many similar families may be constructed. 

Violation of singleton transitivity. As mentioned before, singleton transitivity 
is satisfied in all regular Gaussian distributions and in all binary distributions. But 
the following discrete family of distributions for a 2 x 2 x 3 contingency table violates 
singleton transitivity. It is adapted from Birch (1963), equation (5.4). We write Hijk 
for the joint probabilities of variables A,B,C at levels i,j,k and e.g. vr+j^ = 'Ylii^ijk- 
Conditional probabilities e.g. for A given B,C are ■Ki\jk = iiijk/'^+jk- 

Table 1: A family of distributions that violates singleton transitivity 

47rjjfc(l + a + a^), a > 1 





C : k = l 


k = 2 




k -- 


= 3 


A/B : 




i=l J =2 


3 = 


1 


J = 2 


i = 1 


2 

a a 


a 1 


1 




a2 


i = 2 


a 1 


0? a 


1 




a2 


odds-ratio 


1 


1 






1 



Here, the conditional odds ratios being 1 imply that A^LB\C and the marginal 
probabilities of A, C and of 5, C show that A (\\ C and B (\\ C. Nevertheless, also 
A^LB since 

a very special constellation discussed first by Darroch (1962) and generalized by Wer- 
muth and Cox (2004), section 7, to general types of distributions that are also not 
dependence inducing. Though one can construct families of distributions with such pe- 
culiar parametric constraints, it is difficult to imagine that they could capture a structure 
of interest in any substantive context when studying sequences of regressions. 

In a generating process of Jn, singleton transitivity can be achieved when the indi- 
vidual regressions are permitted to vary independently of the other response components 
and of their common past. This is reached, in particular, when the family to a complete 
graph has a rich enough parametrisation and only the independence constraints of Def- 
inition 1 are imposed on . 

Violation of the intersection property. The intersection property is always 
satisfied by positive distributions and in all in regular Gaussian distributions, even 
though the known necessary and sufficient conditions are less restrictive; see San Martin, 
Mouchart and Rolin (2005). The following discrete family of distributions for a 2 x 2 x 3 
contingency table violates the intersection property. This happens whenever a pair of 
variables shares some common information. For three binary variables, violation of the 
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intersection property coincides with the degenerate case of two variables being identical. 

Table 2: A family of distributions that violates the intersection property 

37r^jfc, 0<a//3<l, 2a + /3<l 

C : k = l k = 2 k = 3 

A/B: j = l j = 2 ~=l j = 2 j = l j = 2 
i = l a a /3 

1 = 2 I- a I- a 1-/3 

In the family shown in Table 2, A^LB\C and A^LC\B, since 

TTiljfc = TTilfc and ■Ki\jk = VTjlj 

but A rh BC. More precisely, A ffi B since ni\j ^ vTj and A &\ C since 7rj|fc 7^ vTj. The 
marginal joint distribution of B, C shows the type of common information shared by 
variables B and C . Variable B taking on level 1 coincides with C taking on value 1 or 
2 and B being at level 2 coincides with C being at level 3. 

Thus, when the joint distribution of 5, C had been generated by first knowing the 
distribution of variable C and then generating the conditional distribution of B given 
C, the levels of variable B are not permitted to vary freely and thereby lead to the 
violation of the intersection property. 

Violation of the composition property. The composition property is always 
satisfied in regular Gaussian distributions and in multivariate symmetric binary dis- 
tributions generated over directed acyclic graphs; see Wermuth, Marchetti and Cox 
(2009). On the other hand, it is always violated when pairwise independences do not 
imply mutual independence. 

The following binary family of distributions for a 2 x 2 x 2 contingency table also 
violates the composition property. In this family, there is a log-linear three-factor inter- 
action since the conditional odd-ratios for A, B differ at the two levels of C. 

Table 3: A family of distributions that violates the composition property 

S-Kijk, < 2a < 1 





C : k = l 


k = 2 


A/B : 


j = l J = 2 


J=l i=2 


i = 1 


1 + 2a 1 - 2a 


1 1 


i = 2 


1 - 2a 1 + 2a 


1 1 


odds-ratio 


{(l + 2a)/(l-2a)}2 


1 



More precisely, at level 2 of C, the variables A, B are independent while the depen- 
dence of this pair is strong at level 1 of C whenever a is large. At the same time, the 
marginal AC and BC tables reveal that AALC and BALC. 
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Thus, when regressing the two components of a joint response AB separately on C, 
one sees no separate effects, but the conditional dependence of A on i? changes with 
the levels of C. This type of structure could in particular not be generated by a single 
unobserved common explanatory variable or if all sets of variable with higher-order 
effects also have main effects in the regressions, that is lowest order interactions. 

With a pragmatic strategy for model selection in which one checks for higher order 
interactions only when there are also main effects, one may overlook such structures that 
could be of substantive interest. For sequences of discrete joint responses, the violation 
will be detected when using the parametrization suggested by Marchetti and Lupparelli 
(2011). In general, the graphical checks for nonlinearities and interactions, as proposed 
by Cox and Wermuth (1994), provide some protection, but only for effects that are 
detectable also in marginal trivariate distributions. 

5 Transforming regression graphs 

The transformations of regression graphs to be introduced, are based on binary matrix 
representations of . Our notation for these edge matrices mimics the one for param- 
eter matrices in Gaussian sequences of regressions generated over the graph. There are 
one-to-one correspondences between a zero in an edge matrix, a vanishing parameter in 
the regular Gaussian family of distributions and a conditional independence statement. 

Linear sequences of regressions. For a mean-centered vector variable Y^- with a 
regular Gaussian distribution generated over G^gWith a split = {u,v), the matrix of 
equation parameters, denoted by H^n, is upper block-triangular and 

HnnYn = Vn with Wj^N = cov^rj^) block-diagonal in the sizes of gj, 

where the submatrix of Huu in rows gj and columns g^j is —Ilgj\g^., the negative of the 
population least-squares coefficient matrix obtained when regressing Yg. on Yyg^. The 
square diagonal submatrices in the sizes of gj are identity matrices. The submatrix if, 
is the marginal concentration matrix of Yy, denoted by S^^ ". This implies Wyy = S 
The square submatrices of Wuu are T,g^g.\g^., the conditional covariance matrices of Yg. 
given Y^g^. For just two connected components a, b the parameter matrices are 



vv 
vv.u 





(laa 


— '^a\b.v 


— ^a\v.b^ 




/y 

^aa\bv 


Oab 


Oa. \ 


Hnn — 


Oba 


hb 


— '^b\v 


Wnn = 


Oba 




Obu 




\Ova 


Ovb 


^vv.ab ^ 




\ Oya 


Oyb 


^vv.ab ^ 



where we use a Yule-Cochran notation for Ila\bv, the regression coefficient matrix of Ya 
regressed on Yb, Yy, for instance Oba denotes a matrix of zeros, and hb an identity matrix. 
For any split of = (a, b), to obtain fa\bfb we let c = a n u, d = b H u, and get 

Kj,j,= ( -HaaHab \ ^ (W^c - W,,W^'W^ W^'W^' 

\ HhaHl} Hbb — HbaH~^Hab j \ ~Wdd ^dc W^rfrf 



aa 
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by partial inversion of H^^ with respect to a and by partial inversion of Wuu with 
respect to b; see for instance Marchetti and Wermuth (2009), Appendix 1. 

Lemma 5. Orthogonalised linear equations. (Wermuth and Cox (2004), Thm 1.) 
The Gaussian density f^- = fuwjv generated over is for any split N = (a, b) trans- 



bb.a 



formed into fN = fa\bfb with E{Ya\Yb) = Ila\b, coY{Ya\Yb) = T^aaib, con(Yb) = S 
as 

Ua\b = ln[Kab + KaaQabKbb], (3) 

E,„„ = ln[K,aQaaJCU, S"^--^ = ln[H;iQbbHbb]. (4) 

The edge matrices of regression graphs. Edge matrices are binary matrix repre- 
sentations of graphs. They are symmetric for undirected graphs, upper block-triangular 
for arrows in a generating and upper-triangular for directed acyclic graphs. The es- 
sential change compared to the more traditionally used adjacency matrices is that ones 
are added along the diagonal of each square matrix. This has the effect that sums of 
matrix products are well-defined and can represent the closing of special types of path 
in graphs; such as in equations ([8j) and ([9]) below. 

Regression graphs have three types of edge sets, E^_, E__ , and E The edge 

matrix components of G^gare a. d]\f x djy upper block-triangular matrix T-Lnn = i'Hik) 
such that 

{1 if and only if — k or i k in G^„ or i = k, 
otherwise, 

and a. du X du symmetric matrix Wuu = (Wik) such that 

, 1 if and only if i k in G^a- oi i = k, 

mk = { (6) 

otherwise, 



where, E__ corresponds to W„«, E to "H^,, and E^^ to Ti 



uN- 



Every regression graph can be represented by its edge matrices given in equa- 



•rcg 



tions (E]) and ([6]). Every dependence base G^g defines in particular a corresponding 
family of Gaussian regressions in which each edge present can be identified by a single 
non- vanishing parameter, an off-diagonal element of -f/^TVAr or Wu^u- 

Partial closure of paths. Partial closure, introduced by Wermuth, Wiedenbeck 
and Cox (2006), is a matrix operator, denoted by zera(-) which acts on row and coUums 
a of a binary matrix. It is applied to edge matrix representations of a starting graph in 
node set to give the edge matrix representations of a new graph in which there is an 
additional ik-edge for a pair i, k that is in the starting graph uncoupled but connected 
by a specific type of edge-inducing a-line path. 
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With partial closure, the set of nodes, node labels, and edges present in the starting 
graph, are preserved in the transformed graph so that the mappings are graph homomor- 
phisms; for this notion see Hell and Neseti^il (2004), for corresponding reparametrizations 
of exponential families see Wiedenbeck and Wermuth (2010). 

Lemma 6. Basic properties of partial closure. (Wermuth, Wiedenbeck and Cox, 
(2006)). Partial closure is (i) commutative, (ii) cannot be undone and (Hi) is exchange- 
able with selecting a submatrix. 

By property (z), it is enough, for some purposes, to show how the operator acts on a 
single node. By property {ii), independences can be removed but never reintroduced so 
that these transformations satisfy set transitivity. Property {iii) justifies node and edge 
reductions since closing edge-inducing a-line paths in a large graph and then selecting a 
square submatrix for a subset containing a, gives the same result as selecting the square 
submatrix first and then closing the a-line paths. 

Because of property (i), one can always permute the matrix J-" into J-" and start 
partial closure with node i corresponding to position (1,1) of J-". Then for b = N \ {i}, 

zer,^ = ln[ ], (7) 

\ J-'bi J-'bb + J^biJ^ib I 

which says that particular Vs in the graph are closed which have node i as inner node. 
In the three small examples of Figure 6, an edge for node pair 1, 3 is induced with i = 2. 




Figure 6: Dependence base, 3-node graphs: a V in a a) directed acyclic, b) covariance, c) 
concentration graph; an active path (1,2,3) induces in a) 1 iti 3, in b) 1 rtl 3|2, and in c) 1 (ti 3 

Applying zer^ to the edge matrix of a directed acyclic graph, covariance graph or con- 
centration graph mimics, respectively, the recursion relation for regression coefficients, 
covariances and concentrations; discussed for instance in Wermuth and Cox (1998). 

By letting the edge induced by the three V 's in Figure 6, 'remember the type of 
edge at the path endpoints', the induced edges become, respectively, 

a) 1^3, b) 1---3, c) 1 3. 

The transformation zeTa{J^) means that all Vs along a-line paths represented by the 
edge matrix J-' are closed by an edge. The basic property (z) implies that the nodes in 
a may be chosen for this in any order. This requires in particular that the inner nodes 
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of the paths of T are of the same type, either all are collision nodes to form collision 
paths, or are all transmitting nodes. 

Lemma 7. Partial closure applied to • The transformation ICnn = 'zeiai'HNN) 
closes each a-line anterior path and Quu = zerb(Wt(u) each dashed, b-line collision path. 

Proof. Each anterior path in and no other type of path is represented hj and 
each dashed-line path in and no other type of path is represented by Wuu- By 
Proposition 1, a V along the former is edge- inducing by marginalizing over its inner 
node and of the latter by conditioning on its inner node. Remembering the type of edge 
at the endpoints of each V on an a-line path oH-Lnn leads to the same induced edge for 
the endpoints of the path irrespective of the order in choosing single nodes of a. □ 

Closing active paths in regression graphs. For directed acyclic graphs, it is 
known that the path criterion on the starting graph for separation of a from /3 given c 
can be reduced to an edge criterion after transforming first the generating graph in terms 
of partial closure and closing next the remaining paths that are relevant for deciding 
whether a_LL/3|c is implied; see Marchetti and Wermuth (2009). This approach is now 
extended to regression graphs and to dependences in traceable regressions. For this, we 
take the partitioning = {a, /3, c, m} of the node set of G^g ,a = Q;Um, 6 = /3Uc, and 

^NN = '^^^a'K-NNi Quu = '^^'^h^uu-, Quv = 0, Qvv = ^vv 

Proposition 2. Induced edge matrices for fa\bfb- Sequences of regressions with 
graph G^g in node set N = {u, v) and generating edge matrices H^n and Wuu imply for 
fa\bfb, with the induced regression graph Greg for Ya regressed on Yb, as edge matrices 

Va\b = In [/Cab + JCaaQabJCbb], (8) 
Saalb = HJCaaQaalCjal 5^'" = HnlQbb-Hbb]. (9) 

Proof. Partial closure mimics transformations of partial inversion such that all elements 
of the induced matrices are non-negative. The zero entries in equations ([3]), @ coincide 
with those in dH]), nonzero entries in the former correspond to ones in the latter; see 
Lemma 3 of Marchetti and Wermuth (2009) for more detail. □ 

Of the active paths, defined for Lemma 2 and needed to decide for uncoupled pairs 
i,k of G^g whether they are coupled in G^g^"''', some remain uncoupled after applying 
zeVaT-LNN and zeTbWuu but get closed with the non-negative sums of edge matrix prod- 
ucts in ([8]), ([9]). Thus, as with partial closure, no edges get ever removed with the latter 
types of graph transformations so that set transitivity is used implicitly. 

For the N = (a, b) as for Proposition 2, let o^ denote nodes in a and o^ nodes in b. 
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Corollary 1. For i^k the endpoints of paths that are edge-inducing for dcg , there 
are three types of ik -path uncoupled in the graph having edge matrices ICnn o.nd Quu, 

i^ Oa Ob^ k, i^ Oa Oa ^k, i yOb OfeH — k, 

which are closed with the induced edge matrices Va\b, Saa\b, S^'^ , respectively, in (|8]), ([9]). 

After remembering the types of edge at the path endpoints, we have with Va\b an 
induced bipartite graph of arrows pointing from h to a, with Saa\h an induced covariance 
graph, and with 5** " an induced concentration graph. 

Lemma 8. Edge matrices induced by for fa/3\c- The subgraph induced by 

nodes aU /3 in G^cg"'^^ captures the independence implications of G^^^ for fa\i3cfi3\c- 

Proof. By the interpretation of the edge matrix components Va\b,Saa\b,<S^'''°', no edges 
are induced by taking 

Jointly, these edge submatrices define the subgraph induced by a U /3 in G^g □ 

The induced graphs in node set aU P and Greg "'^ in node set N, are examples of 
independence-predicting graphs in contrast to independence-preserving graphs such as 
the ribbonless graphs of Sadeghi and Lauritzen (2012) and the different types of Markov- 
equivalent graphs, such as summary graphs. With independence-preserving graphs, 
one can derive effects of additional marginalizing and conditioning in the starting graph 
while independence-predicting graphs can, in general, only be used to decide on 
edges present or missing in the induced graph. 

Proposition 3. Edge criteria for implied independences and dependences. A 

dependence base implies aAL f3\c ifVa\p.c = and it implies a iti f3\c ifVa\i3.c 7^ 0. 

Proof. The statement results with Lemma 7, equation ([8]) and Lemma 8. □ 

Distributions satisfying all and only the independences captured by . 

A given distribution is said to be faithful to a graph if every of its independence con- 
straints is captured by a given independence graph; see Spirtes, Glymour and Scheines 
(1993). For a distribution to be faithful to G^g , it has to satisfy the properties needed 
for the graph transformations of Proposition 3, that is properties {i) to {vii). 

Corollary 2. Distributions that are faithful to G^g . For a distributions with den- 
sity fN generated over a dependence base G^g , the following statements are equivalent 

(i) the distribution is faithful to G^g , 

(ii) every independence and every dependence statement implied by G^^^ holds for f^, 
(Hi) fN satisfies as additional properties: composition, intersection and set transitivity, 
(iv) /at can be generated as a traceable regression without any path cancellations. 
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Thus, faithfulness imposes in general an additional strong condition on traceable 
sequences of regressions. Exceptions are, for instance, directed acyclic graphs in which 
each response has only one parent. But the most common situation in observational and 
in interventional studies is to have two or more regressors influencing a response. Thus, 
for using regression graphs to interpret such structures or to plan future studies with 
a subset of the variables in a subpopulation, it is not sensible to assume that a given 
distribution is faithful to a regression graph. One needs to have traceable regressions 
though and should investigate reasons for path cancellations if they happen to occur. 

6 Discussion 

Sequences of regressions in joint responses permit to model changes in several response 
components occurring at the same time when there is an intervention. This contrasts 
with interventions in sequences of regressions in single responses and in other types of 
chain graph models. 

We have identified properties of sequences of regressions in essentially arbitrary joint 
and single response variables and named them traceable regressions. A corresponding 
regression graph, is a dependence base of the joint distribution in addition to cap- 
turing the independences in the regressions. One knows now that the independence 
structure of such traceable regressions can differ from the implications derived in terms 
of its generating regression graph only when there are path cancellations. 

The consequences derivable with a graph give changes in structure that result in 
families of distributions generated over the graph while one may not be able to generalize 
to this family from the structure that one can see for a distribution with one given set 
of parameters, for instance as estimated in a sample. 

Sequences of traceable regressions and a regression graph have implications for a 
regression of Ya on Yj, and dependences of Yj, alone when these are based on a reordered 
node set = (a, b) that can be expressed with transformed edge matrix components 
of . When marginalizing over m in a = aU m and conditioning on c in 6 = /3 U c, 
the specific implications of G^g for the conditional densities fa\/3c and //3|c can now be 
derived with a subgraph induced by a U /3 in this transformed graph. An edge matrix 
criterion instead of a path criterion gives the global Markov property of G^g and detects, 
in addition, induced dependences when G^gis a dependence base for f]y. 

Many new questions have opened up. These include types of conditions on a given 
distribution under which it represents a traceable regression, conditions on independence- 
predicting graphs which assure that they are also independence-preserving, applications 
such as the special details needed to improve existing methods for meta-analyses, or 
computational aspects, such as conditions under which one type of several equivalent 
graph transformations becomes computationally much less intensive than others. 
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